Goto

Collaborating Authors

 candidate choice


Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference

Shen, Ke, Kejriwal, Mayank

arXiv.org Artificial Intelligence

Despite their impressive performance, large language models (LLMs) such as ChatGPT are known to pose important risks. One such set of risks arises from misplaced confidence, whether over-confidence or under-confidence, that the models have in their inference. While the former is well studied, the latter is not, leading to an asymmetry in understanding the comprehensive risk of the model based on misplaced confidence. In this paper, we address this asymmetry by defining two types of risk (decision and composite risk), and proposing an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks in both discriminative and generative LLMs. The first level relies on a decision rule that determines whether the underlying language model should abstain from inference. The second level (which applies if the model does not abstain) is the model's inference. Detailed experiments on four natural language commonsense reasoning datasets using both an open-source ensemble-based RoBERTa model and ChatGPT, demonstrate the practical utility of the evaluation framework. For example, our results show that our framework can get an LLM to confidently respond to an extra 20.1% of low-risk inference tasks that other methods might misclassify as high-risk, and skip 19.8% of high-risk tasks, which would have been answered incorrectly.


Computational-level Analysis of Constraint Compliance for General Intelligence

Wray, Robert E., Jones, Steven J., Laird, John E.

arXiv.org Artificial Intelligence

Human behavior is conditioned by codes and norms that constrain action. Rules, ``manners,'' laws, and moral imperatives are examples of classes of constraints that govern human behavior. These systems of constraints are "messy:" individual constraints are often poorly defined, what constraints are relevant in a particular situation may be unknown or ambiguous, constraints interact and conflict with one another, and determining how to act within the bounds of the relevant constraints may be a significant challenge, especially when rapid decisions are needed. Despite such messiness, humans incorporate constraints in their decisions robustly and rapidly. General, artificially-intelligent agents must also be able to navigate the messiness of systems of real-world constraints in order to behave predictability and reliably. In this paper, we characterize sources of complexity in constraint processing for general agents and describe a computational-level analysis for such constraint compliance. We identify key algorithmic requirements based on the computational-level analysis and outline an initial, exploratory implementation of a general approach to constraint compliance.


Choice Fusion as Knowledge for Zero-Shot Dialogue State Tracking

Su, Ruolin, Yang, Jingfeng, Wu, Ting-Wei, Juang, Biing-Hwang

arXiv.org Artificial Intelligence

Nowadays, the requirements of deploying an increasing number of services across a variety of domains raise challenges With the demanding need for deploying dialogue systems in to DST models in production [4]. However, existing new domains with less cost, zero-shot dialogue state tracking dialogue datasets only span a few domains, making it impossible (DST), which tracks user's requirements in task-oriented dialogues to train a DST model upon all conceivable conversation without training on desired domains, draws attention flows [5]. Furthermore, dialogue systems are required to infer increasingly. Although prior works have leveraged questionanswering dialogue states with dynamic techniques and offer diverse (QA) data to reduce the need for in-domain training interfaces for different services. Despite the fact that the copy in DST, they fail to explicitly model knowledge transfer mechanism [6] or dialogue acts [7] are leveraged to efficiently and fusion for tracking dialogue states. To address this issue, track slots and values in the dialogue history, the performance we propose CoFunDST, which is trained on domain-agnostic of DST still relies on a large number of annotations of dialogue QA datasets and directly uses candidate choices of slot-values states, which is expensive and inefficient to collect data as knowledge for zero-shot dialogue-state generation, based for every new domain and service.


Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes

Shen, Ke, Kejriwal, Mayank

arXiv.org Artificial Intelligence

Recent work on transformer-based neural networks has led to impressive advances on multiple-choice natural language understanding (NLU) problems, such as Question Answering (QA) and abductive reasoning. Despite these advances, there is limited work still on understanding whether these models respond to perturbed multiple-choice instances in a sufficiently robust manner that would allow them to be trusted in real-world situations. We present four confusion probes, inspired by similar phenomena first identified in the behavioral science community, to test for problems such as prior bias and choice paralysis. Experimentally, we probe a widely used transformer-based multiple-choice NLU system using four established benchmark datasets. Here we show that the model exhibits significant prior bias and to a lesser, but still highly significant degree, choice paralysis, in addition to other problems. Our results suggest that stronger testing protocols and additional benchmarks may be necessary before the language models are used in front-facing systems or decision making with real world consequences.


Fuzzy Forests For Feature Selection in High-Dimensional Survey Data: An Application to the 2020 U.S. Presidential Election

Dey, Sreemanti, Alvarez, R. Michael

arXiv.org Machine Learning

An increasingly common methodological issue in the field of social science is high-dimensional and highly correlated datasets that are unamenable to the traditional deductive framework of study. Analysis of candidate choice in the 2020 Presidential Election is one area in which this issue presents itself: in order to test the many theories explaining the outcome of the election, it is necessary to use data such as the 2020 Cooperative Election Study Common Content, with hundreds of highly correlated features. We present the Fuzzy Forests algorithm, a variant of the popular Random Forests ensemble method, as an efficient way to reduce the feature space in such cases with minimal bias, while also maintaining predictive performance on par with common algorithms like Random Forests and logit. Using Fuzzy Forests, we isolate the top correlates of candidate choice and find that partisan polarization was the strongest factor driving the 2020 presidential election. Social science research today often encounters a difficult methodological situation -- larger and larger datasets, which contain high-dimensional features, which are highly correlated [7]. Quite literally, as in the application we discuss in our paper (the 2020 U.S Presidential election), to test the many different theories and potential explanations for why voters decided to remove then President Trump from office, researchers need to use methodologies that can quickly and efficiently reduce the feature space from hundreds of possible features to a smaller set that can then be the focus of further study. In our paper we present a variant of the popular Random Forest, Fuzzy Forests, which we argue is well suited for exactly this type of applied machine learning problem [6]. Fuzzy Forests are ideal for feature selection in large and high-dimensional datasets, where the features are highly correlated.


Exploiting Sentence Embedding for Medical Question Answering

Hao, Yu, Liu, Xien, Wu, Ji, Lv, Ping

arXiv.org Artificial Intelligence

Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.